49 research outputs found
HitFraud: A Broad Learning Approach for Collective Fraud Detection in Heterogeneous Information Networks
On electronic game platforms, different payment transactions have different
levels of risk. Risk is generally higher for digital goods in e-commerce.
However, it differs based on product and its popularity, the offer type
(packaged game, virtual currency to a game or subscription service), storefront
and geography. Existing fraud policies and models make decisions independently
for each transaction based on transaction attributes, payment velocities, user
characteristics, and other relevant information. However, suspicious
transactions may still evade detection and hence we propose a broad learning
approach leveraging a graph based perspective to uncover relationships among
suspicious transactions, i.e., inter-transaction dependency. Our focus is to
detect suspicious transactions by capturing common fraudulent behaviors that
would not be considered suspicious when being considered in isolation. In this
paper, we present HitFraud that leverages heterogeneous information networks
for collective fraud detection by exploring correlated and fast evolving
fraudulent behaviors. First, a heterogeneous information network is designed to
link entities of interest in the transaction database via different semantics.
Then, graph based features are efficiently discovered from the network
exploiting the concept of meta-paths, and decisions on frauds are made
collectively on test instances. Experiments on real-world payment transaction
data from Electronic Arts demonstrate that the prediction performance is
effectively boosted by HitFraud with fast convergence where the computation of
meta-path based features is largely optimized. Notably, recall can be improved
up to 7.93% and F-score 4.62% compared to baselines.Comment: ICDM 201
Mining Brain Networks using Multiple Side Views for Neurological Disorder Identification
Mining discriminative subgraph patterns from graph data has attracted great
interest in recent years. It has a wide variety of applications in disease
diagnosis, neuroimaging, etc. Most research on subgraph mining focuses on the
graph representation alone. However, in many real-world applications, the side
information is available along with the graph data. For example, for
neurological disorder identification, in addition to the brain networks derived
from neuroimaging data, hundreds of clinical, immunologic, serologic and
cognitive measures may also be documented for each subject. These measures
compose multiple side views encoding a tremendous amount of supplemental
information for diagnostic purposes, yet are often ignored. In this paper, we
study the problem of discriminative subgraph selection using multiple side
views and propose a novel solution to find an optimal set of subgraph features
for graph classification by exploring a plurality of side views. We derive a
feature evaluation criterion, named gSide, to estimate the usefulness of
subgraph patterns based upon side views. Then we develop a branch-and-bound
algorithm, called gMSV, to efficiently search for optimal subgraph features by
integrating the subgraph mining process and the procedure of discriminative
feature selection. Empirical studies on graph classification tasks for
neurological disorders using brain networks demonstrate that subgraph patterns
selected by the multi-side-view guided subgraph selection approach can
effectively boost graph classification performances and are relevant to disease
diagnosis.Comment: in Proceedings of IEEE International Conference on Data Mining (ICDM)
201
Learning from Multi-View Multi-Way Data via Structural Factorization Machines
Real-world relations among entities can often be observed and determined by
different perspectives/views. For example, the decision made by a user on
whether to adopt an item relies on multiple aspects such as the contextual
information of the decision, the item's attributes, the user's profile and the
reviews given by other users. Different views may exhibit multi-way
interactions among entities and provide complementary information. In this
paper, we introduce a multi-tensor-based approach that can preserve the
underlying structure of multi-view data in a generic predictive model.
Specifically, we propose structural factorization machines (SFMs) that learn
the common latent spaces shared by multi-view tensors and automatically adjust
the importance of each view in the predictive model. Furthermore, the
complexity of SFMs is linear in the number of parameters, which make SFMs
suitable to large-scale problems. Extensive experiments on real-world datasets
demonstrate that the proposed SFMs outperform several state-of-the-art methods
in terms of prediction accuracy and computational cost.Comment: 10 page
Private Model Compression via Knowledge Distillation
The soaring demand for intelligent mobile applications calls for deploying
powerful deep neural networks (DNNs) on mobile devices. However, the
outstanding performance of DNNs notoriously relies on increasingly complex
models, which in turn is associated with an increase in computational expense
far surpassing mobile devices' capacity. What is worse, app service providers
need to collect and utilize a large volume of users' data, which contain
sensitive information, to build the sophisticated DNN models. Directly
deploying these models on public mobile devices presents prohibitive privacy
risk. To benefit from the on-device deep learning without the capacity and
privacy concerns, we design a private model compression framework RONA.
Following the knowledge distillation paradigm, we jointly use hint learning,
distillation learning, and self learning to train a compact and fast neural
network. The knowledge distilled from the cumbersome model is adaptively
bounded and carefully perturbed to enforce differential privacy. We further
propose an elegant query sample selection method to reduce the number of
queries and control the privacy loss. A series of empirical evaluations as well
as the implementation on an Android mobile device show that RONA can not only
compress cumbersome models efficiently but also provide a strong privacy
guarantee. For example, on SVHN, when a meaningful
-differential privacy is guaranteed, the compact model trained
by RONA can obtain 20 compression ratio and 19 speed-up with
merely 0.97% accuracy loss.Comment: Conference version accepted by AAAI'1
Sequential Keystroke Behavioral Biometrics for Mobile User Identification via Multi-view Deep Learning
With the rapid growth in smartphone usage, more organizations begin to focus
on providing better services for mobile users. User identification can help
these organizations to identify their customers and then cater services that
have been customized for them. Currently, the use of cookies is the most common
form to identify users. However, cookies are not easily transportable (e.g.,
when a user uses a different login account, cookies do not follow the user).
This limitation motivates the need to use behavior biometric for user
identification. In this paper, we propose DEEPSERVICE, a new technique that can
identify mobile users based on user's keystroke information captured by a
special keyboard or web browser. Our evaluation results indicate that
DEEPSERVICE is highly accurate in identifying mobile users (over 93% accuracy).
The technique is also efficient and only takes less than 1 ms to perform
identification.Comment: 2017 Joint European Conference on Machine Learning and Knowledge
Discovery in Database